Changing the Transport¶
Similar to how you are able to change the serialiser by importing a new one and passing it to Dataset, you are also able to do this with Transport.
Use Cases¶
The main reason to swap transport, is if the default rsync
does not work for your system. This can either be related to the remote machine, the connection, or an outdated version.
Important
remotemanager
requires rsync --version >= 3.0.0
. MacOS devices may run an outdated version. To fix this, you can either update your install (slower, but permanent fix), or swap to scp
(fast, but is required for each Dataset
).
Even if you have no issues, it is possible to customise the transport further by setting Flags directly. This is an alternative method to that shown in the flags tutorial.
Importing¶
Just like with serialdill
, serialjson
, etc., you may import from the available Transport methods:
rsync
scp
cp
Of these, cp
is less useful as it is unable to connect to external machines. It is provided for the edge case where you require no remote connection and the machine has no rsync or scp. And to provide a very simple template for creating your own Transport.
To start, we can set up a run just as normal. The transport is a drop in replacement, having no effect on the Dataset other than the command that actually gets used to send/retrieve data.
[1]:
from remotemanager import Dataset
[2]:
def function(x, y):
return x * y
Since rsync
is default, lets swap to scp
[3]:
from remotemanager.transport import scp
[4]:
ds = Dataset(
function,
skip = False,
transport = scp(), # new option!
)
Note
Like the serialiser, and URL, the transport object must be instantised (“called”) at some point post-import.
[5]:
ds.append_run({"x": 21, "y": 2})
ds.run()
ds.wait(1, 10)
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
[6]:
ds.fetch_results()
ds.results
Fetching results
Transferring 2 Files... Done
[6]:
[42]
Verification¶
Right now, it looks like nothing has changed, we have to do some digging to see if it worked.
A quick way is to check the transport
property
[7]:
ds.transport
[7]:
<remotemanager.transport.scp.scp at 0x7fd51869f310>
That reads, scp
, so it’s the right module at least. But we want to see some commands. Lets search the cmd_history for commands containing scp
:
[8]:
for cmd in ds.url.cmd_history:
if "scp" in cmd.sent:
print(cmd.sent)
break
scp -r /home/test/remotemanager/docs/source/tutorials/temp_runner_local/{dataset-991e1c92-master.sh,dataset-991e1c92-repo.py,dataset-991e1c92-repo.sh,dataset-991e1c92-runner-0-jobscript.sh,dataset-991e1c92-runner-0-run.py} temp_runner_remote/
And there we have our first scp call, sending data from local to remtote dirs.
Flags¶
As mentioned at the top, it is possible to directly set the flags of the transport at the initialisation, using the flags
keyword:
[9]:
ds = Dataset(
function,
skip = False,
transport = scp(flags="-v"), # new option!
)
[10]:
ds.transport.flags
[10]:
-v
Custom Transport¶
Just like with Serialiser, it is possible to create your own transport class.
This can be done by subclassing the transport module and adding the necessary overrides (usually just the cmd
method).
cmd
¶
When overriding the cmd
method of Transport
, there is a pattern to follow.
The docstring of the base level method explains this in detail. Found here.
But in short, the function should return a valid command in string form, and accept two arguments primary
and secondary
. These are both strings.
primary
¶
This argument will come “preformatted” in bash-syntax. For example directory_name/{file1,file2,file3,...,fileN}